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Abstract 

>• With X* denoting a random variable with the X-size bias distri- 

bution, what are all distributions for X such that it is possible to have 
q^ \ X* = X + Y, Y > 0, with X and Y independent? We give the answer, 

cn ' due to Steutel [T7] , and also discuss the relations of size biasing to the 

f — ■ waiting time paradox, renewal theory, sampling, tightness and uniform 

^^ ', integrability, compound Poisson distributions, infinite divisibility, and 

the lognormal distributions. 

X ■ 1 The Waiting Time Paradox 



Here is the "waiting time paradox," paraphrased from Feller [9], vol- 
ume II, section 1.4: Buses arrive in accordance with a Poisson process, 
so that the interarrival times are given by independent random vari- 
ables, having the exponential distribution P(X > s) = e~ s for s > 0, 
with mean TEX = 1. I now arrive at an arbitrary time t. What is 
the expectation IEWj of my waiting time Wt for the next bus? Two 
contradictory answers stand to reason: (a) The lack of memory of the 
exponential distribution, i.e. the property P(X > r + s\X > s) = 
TP(X > r), implies that TEWt should not be sensitive to the choice t, 
so that TEWt = TEWq = 1. (b) The time of my arrival is "chosen 



at random" in the interval between two consecutive buses, and for 
reasons of symmetry JEWt = 1/2. 

The resolution of this paradox requires an understanding of size 
biasing. We will first present some simpler examples of size biasing, 
before returning to the waiting time paradox and its resolution. 

Size biasing occurs in many unexpected contexts, such as statis- 
tical estimation, renewal theory, infinite divisibility of distributions, 
and number theory. The key relation is that to size bias a sum with 
independent summands, one needs only size bias a single summand, 
chosen at random. 

2 Size Biasing in Sampling 

We asked students who ate lunch in the cafeteria "How many peo- 
ple, including yourself, sat at your table?" Twenty percent said they 
ate alone, thirty percent said they ate with one other person, thirty 
percent said they ate at a table of three, and the remaining twenty 
percent said they ate at a table of four. From this information, would 
it be correct to conclude that twenty percent of the tables had only 
one person, thirty percent had two people, thirty percent had three 
people, and twenty percent had four people? 

Certainly not! The easiest way to think about this situation is 
to imagine 100 students went to lunch, and we interviewed them all. 
Thus, twenty students ate alone, using 20 tables, thirty students ate 
in pairs, using 15 tables, thirty students ate in trios, using 10 tables, 
and twenty students ate in groups of four, using 5 tables. So there 
were 20 + 15 + 10 + 5 = 50 occupied tables, of which forty percent had 
only one person, thirty percent had two people, twenty percent had 
three people, and ten percent had four people. 

A probabilistic view of this example begins by considering the ex- 
periment where an occupied table is selected at random and the num- 
ber of people, X, at that table is recorded. From the analysis so 
far, we see that since 20 of the 50 occupied tables had only a single 
individual, TP(X = 1) = .4, and so forth. A different experiment, 
one related to but not to be confused with the first, would be to 
select a person at random, and record the total number X* at the 
table where this individual had lunch. Our story began with the in- 
formation P(X* = 1) = .2, P(A* = 2) = .3, and so forth, and the 
distributions of the random variables X and X* are given side by side 



in the following table: 
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P(X = fc) 


TP(X* = k) 
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.4 


.2 
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.3 


.3 
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.2 


.3 
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.1 


.2 




1.0 


1.0 



The distributions of the random variables X and X* are related; for X 
each table has the same chance to be selected, but for X* the chance to 
select a table is proportional to the number of people who sat there. 
Thus IP(X* = k) is proportional to k x JP(X = k); expressing the 
proportionality with a constant c we have 1P(X* = k) = cxlP(X = k). 

Since 1 = Y.k W ( x * = k ) = c 12k k]P ( X = k ) = cW > x , we have 
c = 1/EX and 



W(X* = k) 



klP(X = k) 



A; = 0,1,2, 



(1) 



Since the distribution of X* is weighted by the value, or size, of X, 
we say that X* has the X size biased distribution. 

In many statistical sampling situations, like the one above, care 
must be taken so that one does not inadvertently sample from the size 
biased distribution in place of the one intended. For instance, suppose 
we wanted to have information on how many voice telephone lines 
are connected at residential addresses. Calling residential telephone 
numbers by random digit dialing and asking how many telephone lines 
are connected at the locations which respond is an instance where one 
would be observing the size biased distribution instead of the one 
desired. It's three times more likely for a residence with three lines to 
be called than a residence with only one. And the size bias distribution 
never has any mass at zero, so no one answers the phone and tells a 
surveyor that there are no lines at the address just reached! But the 
same bias exists more subtly in other types of sampling more akin to 
the one above: what if we were to ask people at random how many 
brothers and sisters they have, or how many fellow passengers just 
arrived with them on their flight from New York? 



3 Size Bias in General 

The examples in Section [2] involved nonnegative integer valued ran- 
dom variables. In general, a random variable X can be size biased 
if and only if it is nonnegative, with finite and positive mean, i.e. 
1 = TP{X > 0) and < TEX < oo. We will henceforth assume that X 
is nonnegative, with a := TEX € (0, oo). For such X, we say X* has 
the X size biased distribution if and only for all bounded continuous 
functions g, 

TEg(X*) = - TE(Xg(X)). (2) 

a 

It is easy to see that, as a condition on distributions, ([2]) is equivalent 
to 

, P / s xdF(x) 
dF x *(x) = . 



In particular, when X is discrete with probability mass function /, or 
when X is continuous with density /, the formula 

/(*) = ^, (3) 

a 

applies; (pQ) is a special case of the former. 

If (|2|) holds for all bounded continuous g, then by monotone con- 
vergence it also holds for any function g such that IE|Xg(X)| < oo. 
In particular, taking g(x) = x 11 , we have 

IE(X*) n = TEX n+1 /TEX (4) 

whenever IE|X n+1 | < oo. Apart from the extra scaling by 1/IEX, ([!]) 
says that the sequence of moments of X* is the sequence of moments 
of X, but shifted by one. One way to recognize size biasing is through 
the "shift of the moment sequence;" we give an example in Section [T5l 
In this paper, we ask and solve the following problem: what are all 
possible distributions for X > with < EX < oo, such that there 
exists a coupling in which 

X* = X + Y, Y > 0, and X, Y are independent. (5) 

Resolving this question on independence leads us to the infinite divis- 
ible and compound Poisson distributions. These concepts by them- 
selves can be quite technical, but in our size biasing context they are 



relatively easy. We also present some background information on size 
biasing, in particular how it arises in applications including statistics. 
The answer to ([5]) comes from Steutel 1973 [TTj; see section 10 for 
more of the history. 

A beautiful treatment of size biasing for branching processes is |14| 
by Lyons, Pemantle, and Peres. Size biasing has a connection with 
Stein's method for obtaining error bounds when approximating the 
distributions of sums by the Normal, (|4J Baldi, P. Rinott, Y. 1989, [3] 
Baldi, P. Rinott, Y. and Stein C, 1989, and [10J Goldstein and Rinott, 
1996), and the Poisson ([6], Barbour, Hoist, and Janson, 1992). 

To more fully explain the term "increment" in the title, letting 
g{x) = t(x > t) in ([2]) for some fixed t, we find that 

W(X* >t) = - MX1(X >t)) > - TEX TE1(X > t) = TP(X > t). 
a a 

The inequality above is the special case f(x) = x, g(x) = l(x > t) of 
Chebyschev's correlation inequality: JE,(f(X)g(X)) > TEf(X) TEg(X) 
for any random variable and any two increasing functions /, g. The 
condition W(X* > t) > W(X > t) for all t is described as "X* lies 
above X in distribution," and implies that there exist couplings of X* 
and X in which always X* > X. Writing Y for the difference, we have 

X* = X + Y, Y > 0. (6) 

The simplest coupling satisfying (|6|) is based on the "quantile trans- 
formation," constructing each of X and X* from the same uniform 
random variable U on (0,1). Explicitly, with cumulative distribu- 
tion function F defined by F{t) := P(Y < t), and its "inverse" 
defined by F _1 (u) := sup{£ : F(t) < u}, the coupling given by 
X = F- l {U),X* = {F*)- l {U) satisfies ©. 

In general ([6]) determines neither the joint distribution of X and 
Y, nor the marginal distribution of Y, nor whether or not X and Y 
are independent. It is a further restriction on the distribution of X to 
require that ([6]) be achievable with X, Y independent. 

When Z ~ Po(\), i.e. Z is Poisson with P(Z = k) = e- x X k /kl, 
k = 0, 1, 2, . . ., we have Z* = Z + 1, where the notation = denotes 
equality in distribution. The reader can check 

Z* = Z + 1 (7) 

directly using ([2]); a conceptual derivation is given in Example 1) in 
Section I16.ll Scaling by a factor y > in general means to replace X 



by yX, and it follows easily from ([2]) that 

(yXy = y(X*). (8) 

For our case, multiplying ([7]) by y > yields the implication, for 
Poisson Z, 

if X = yZ, then X* = X + y. (9) 

Hence, for each A > and y > 0, ([9]) gives an example where (|5|) 
is satisfied with Y a constant random variable, which is independent 
of every random variable. In a very concrete sense, all solutions of © 
can be built up from these examples, but to accomplish that we must 
first review how to size bias sums of independent random variables. 

4 How to size bias a sum of indepen- 
dent random variables 

Consider a sum X = X\ + • • • + X n , with independent non-negative 
summands X{, and suppose that IEXj = Oj, EX = a. Write Si = 
X — Xi, so that Si and Xj are independent, and also take Si and X* 
to be independent; this is used to obtain the final inequality in (fK)|) 
below. 

We have for all bounded functions g, 

E 5 (X*) = JE(Xg(X))/a 

n 

= j^(ai/a)-E(Xig{Si + X^/at 

j=l 
n 

= ^(ai/a)mg(Si + Xi). (10) 

j=i 

The result in (jlOl) says precisely that X* can be represented by the 
mixture of the distributions Si + X* with mixture probabilities ai/a. 
In words, in order to size bias the sum X with independent summands, 
we first pick an independent index / with probability proportional to 
its expectation, that is, with distribution W(I = i) = ai/a, and then 
size bias only the summand Xj. Or, with X±, . . . , X n , X±, . . . , X* and 
/ all independent 

(X 1 + X 2 + --- + X n )* = X 1 + --- + X I „ 1 + X* I +X I+1 + --- + X n . (11) 



For the special case where the summands Xi are not only inde- 
pendent but also identically distributed, or i.i.d., this recipe simplifies. 
In this case it does not matter which summand is biased, as all the 
distributions in the mixture are the same; hence for any i = 1, . . . , n, 

X* = Xi H h Xj_i + X* + X i+ i H h X„. In particular we may 

use i = 1 so that 

(Xi + X 2 + • • • + X n f = X{ + X 2 + X 3 + • • • + X n . (12) 

5 Waiting for a bus: the renewal the- 
ory connection 

Renewal theory provides a conceptual explanation of the identity (fT2|) 
and at the same time gives an explanation of the waiting time paradox. 
Let the interarrival times of our buses in Section [1] be denoted X% , 
so that buses arrive at times X\,X\ + X 2 ,Xi + X 2 + X3, ..., and 
assume only that the Xi are i.i.d., strictly positive random variables 
with finite mean; the paradox presented earlier was the special case 
with Xi exponentially distributed. Implicit in the story of my arrival 
time T as "arbitrary" is that my precise arrival time does not matter, 
and that there should be no relation between my arrival time and the 
schedule of buses. One way to model this assumption is to choose T 
uniformly from to /, independent of X\,X 2 , ■ ■ ., and then take the 
limit as / — > 00; informally, just imagine some very large I. Such a T 
corresponds to throwing a dart at random from a great distance toward 
the real line, which has been subdivided into intervals of lengths Xi. 
Naturally the dart is twice as likely to land in a given interval of length 
two than one of length one, and generally x times as likely to land in 
a given interval of length x as one of length one. In other words, if 
the interarrival times Xi have a distribution dF(x), the distribution 
of the length of the interval where the dart lands is proportional to 
x dF{x). The constant of proportionality must be 1/a, in order to 
make a legitimate distribution, so the distribution of the interval where 
the dart lands is the distribution of X* . 

The conceptual explanation of identity (TL2J) is the following. Sup- 
pose that every n th bus is bright blue, so that the waiting time be- 
tween bright blue buses is the sum over a block of n successive ar- 
rival times. Again, the random time T finds itself in an interval 
whose length is distributed as the size biased distribution of the in- 



terarrival times; the length of the neighboring intervals are not af- 
fected. But by considering the variables as appearing in blocks of 
n, the random time T must also find itself in a block distributed as 
(X\ + • • • +X n )* . Since only the interval containing one of the interar- 
rival times has been size biased, this sum must be equal in distribution 
to X x + • • • + Xj_i + X* + X i+1 + ... + X n . 

A more precise explanation of our waiting time paradox is based 
on the concept of stationarity — randomizing the schedule of buses so 
that I can arrive at an arbitrary time t, and specifying a particular t 
does not influence how long I must wait for the next bus. The simple 
process with arrivals at times X\ , X\ + X2 , X\ + X2 + X3 , . . . is in 
general not stationary; and the distribution of the time Wt that we 
wait from time t until the arrival of the next bus varies with t. We 
can, however, cook up a stationary process from this simple process 
by a modification suggested by size biasing. For motivation, recall 
the case where I arrive at T chosen uniformly from (0,/). In the 
limit as I — > 00 the interval containing T has length distributed as 
X*, and my arrival within this interval is 'completely random.' That 
is, I wait UX* for the next bus, and I missed the previous bus by 
(1 — U)X* , where U is uniform on (0,1) and independent of X*. Thus 
it is plausible that one can form a stationary renewal process by the 
following recipe. Extend X\,X2,--- to an independent, identically 
distributed sequence . . . ,X-2,X-i,Xq, Xi,X2, ■ ■ ■ . Let Xq be the 
size biased version of Xq and let U be chosen uniformly in (0,1), with 
all variables independent. The origin is to occupy an interval of length 
Xq , and the location of the origin is to be uniformly distributed over 
this interval; hence buses arrive at time UXq and — (1 — U)Xq. Using 
X\, X2, ■ ■ ■ and X_i, X_2, ... as inter arrival times on the positive and 
negative side, we obtain a process by setting bus arrivals at the positive 
times UXq , UXq + X\ , UXq + X\ + X2 , ■ ■ ■ , and at the negative times 
-(1 - U)X*,-((1 - U)X* + X_i), -((1 - U)X* + X_! + X_ 2 ), . . . , 
and it can be proved that this process is stationary. 

The interval which covers the origin has expected length IEXq = 
EX^/EXo (by (HJ) with n = 1,) and the ratio of this to EI is 
EX^/EXo = EX^/(IEXo) 2 . By Cauchy-Schwarz, this ratio is at least 
1; and every value in [l,oo] is feasible. Note that my waiting time is 
WW T = HWo = E(Z7X£) = (1/2)1EX5, so the ratio of my waiting 
time to the average time between buses can be any value between 1/2 
and infinity, depending on the distribution of the interarrival times. 

The exponential case is very special, where strange and wonder- 



ful "coincidences" effectively hide all the structure involved in size 
biasing and stationarity. The distribution of Xq, obtained by size 
biasing the unit exponential, has density xe~ x for x > 0, using © 
with a = 1. This distribution is known as Gamma(l,2). In par- 
ticular, JEX* = J °° x(xe~ x ) dx = 2, and splitting this in half for 
"symmetry" as in Feller's answer (b) gives 1 as the expected time I 
must wait for the next bus. Furthermore, the independent uniform 
U splits that Gamma(l,2) variable Xq into UXq and (1 — U)Xq, 
and these turn out to be independent, and each having the origi- 
nal exponential distribution. Thus the general recipe for cooking up 
a stationary process, involving Xq and U in general, simplifies be- 
yond recognition: the original simple schedule with arrivals at times 
X\, X1+X2, X1+X2+X2, . . . forms half of a stationary process, which 
is completed by its other half, arrivals at —X[, — (X[ + X' 2 ), . . . , with 
X\, Xii ■ ■ . ,X[, X' 2 , ■ ■ ■ all independent and exponentially distributed. 

6 Size bias in statistics 

But size biasing is not always undesired. In fact, it can be used to 
construct unbiased estimators of quantities that are at first glance dif- 
ficult to estimate without bias. Suppose we have a population of n 
individuals, and associated to each individual % is the pair of real num- 
bers Xi > and j/j, with ^ x% > 0. Perhaps x% is how much the i th 
customer was billed by their utility company last month, and j/j, say 
a smaller value than Xi, the amount they were supposed to have been 
billed. Suppose we would like to know just how severe the overbilling 
error is; we would like to know the 'adjustment factor', which is the 
ratio ^2iUi/^2iXi. Collecting the paired values for everyone is labo- 
rious and expensive, so we would like to be able to use a sample of 
m < n pairs to make an estimate. It is not too hard to verify that 
if we choose a set R by selecting m pairs uniformly from the n, then 
the estimate 'Ylj^RyjlYl l j^R x j wm be biased; that is, the estimate, 
on average, will not equal the ratio we are trying to estimate. 

Here's how size biasing can be used to construct an unbiased esti- 
mate of the ratio ^ yi/ ^ Xi, using m < n pairs. Create a random 
set R of size m by first selecting a pair with probability proportional to 
Xj, and then m—1 pairs uniformly from the remaining pairs. Though 
we are out of the independent framework, the principle of f)12|) is still 
at work; size biasing one has size biased the sum. (This is so because 



we have size biased the one, and then chosen the others from an ap- 
propriate conditional distribution.) That is, one can now show that 
by biasing to include the single element in proportion to its x value, 
we have achieved a distribution whereby the probability of choosing 
the set r is proportional to ^iGr x j- From this observation it is not 
hard to see why E(£\ 6 £ j/,/ J2 je ^ Xj) = Ei^/Ei x *- This method 
is known as Midzuno's procedure for unbiased ratio estimation, and 
is noted in Cochran [8]. 

7 Size biasing, tightness, and uniform 
integrability 

Recall that a collection of random variables {Y a : a € /} is tight iff 
for all e > there exists L < oo such that 

JP(Y a [-L, L\) < e for all a € I. 

This definition looks quite similar to the definition of uniform integra- 
bility, where we say {X a : a € 1} is uniformly integrable, or UI, iff for 
all S > there exists L < oo such that 

JE(\X a \;X a ^[-L,L}) <5 for all a e I. 

Intuitively, tightness for a family is that uniformly over the family, 
the probability mass due to large values is arbitrarily small. Similarly, 
uniform integrability is the condition that, uniformly over the family, 
the contribution to the expectation due to large values is arbitrarily 
small. 

Tightness of the family of random variables \Y a : a £ 1} implies 
that every sequence of variables Y n , n = 1, 2, . . . from the family has a 
subsequence that converges in distribution. The concept of tightness 
is very useful not just for random variables, that is, real-valued ran- 
dom objects, but also for random elements of other spaces; in more 
general spaces, the closed intervals [—L,L] are replaced by compact 
sets. If {X a : a G 1} is uniformly integrable, IEX n — > TEX for any 
sequence of variables X n ,n = 1,2, .. . from the family that converges 
in distribution. 

To discuss the connection between size biasing and uniform inte- 
grability, it is useful to restate the basic definitions in terms of non- 
negative random variables. It is clear from the definition of tightness 
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above that a family of nonnegative random variables {Y a : a £ 1} is 
tight iff for all e > there exists L < oo such that 

W(Y a > L) < e for all a£l, (13) 

and from the definition of UI, that a family of nonnegative random 
variables {X a : a £ 1} is uniformly integrable iff for all 5 > there 
exists L < oo such that 

IE(X Q ; X a > L) < 5 for all a 6 I. (14) 

For general random variables, the family {G a : a £ 1} is tight [respec- 
tively UI] iff {\G a \ : a £ 1} is tight [respectively UI]. We specialize in 
the remainder of this section to random variables that are non-negative 
with finite, strictly positive mean. 

Since size bias relates contribution to the expectation to probabil- 
ity mass, there should be a connection between tightness, size bias, 
and UI. However, care should be taken to distinguish between the (ad- 
ditive) contribution to expectation, and the relative contribution to 
expectation. The following example makes this distinction clear. Let 

P(X n = n) = 1/n 2 , W(X n = 0) = 1 - 1/n 2 , n = 1, 2, . . . . 

Here, IEX n = 1/n, the family {X n } is uniformly integrable, but 1 = 
P(X* = n), so the family {X*} is not tight. The trouble is that the 
additive contribution to the expectation from large values of X n is 
small, but the relative contribution is large — one hundred percent! 
The following two theorems, which exclude this phenomenon, show 
that tightness and uniform integr ability are very closely related. 

Theorem 7.1 Assume that for a £ I, where I is an arbitrary index 
set, the random variables X a satisfy X a > and c < IEX a < oo, for 
some c > 0. For each a letY a = X*. Then 

{X a : a £ 1} is UI iff \Y a : a £ 1} is tight. 

Proof. First, with Y a = X* a , we have W(Y a > L) = M(l(Y a > L)) = 
M{X a l{X a > L))/JEX a , so for any L and a £ I, 

TE(X a ;X a >L) = TEX a W(Y a > L). 

Assume that {X a : a £ 1} is UI, and let e > be given to test 
tightness in (|13p . Let L be such that (|14H is satisfied with 5 = ec. 
Now, using W,X a > c, for every a £ I, 

W(Y a >L) = ^{X a ;X a > L)/JEX a < JE(X a ;X a > L)/c < 6/c = e, 

11 



establishing (fl3l) . 

Second, assume that {X a : a £ 1} if tight, and take Lq to satisfy 
(TT31) with e := 1/2, so that IP(T a > L ) < 1/2 for all a £ I. Hence, 
for all a £ I, 

TE(X a ;X a > L ) = EI a P(y a > L ) < EJ a /2, 

and therefore, 

-^0 > E(X a ;X Q < Lo) = El a — IE(X Q ;X a > Lo) 

> EI a - IEX a /2 = IEX a /2, 



and hence EX Q < 2Lo- Now given 5 > let L satisfy (|13p for e = 
5/(2L Q ). Hence Vq 6 I, 

E(X Q ; X a > L) = TEX a JP(Y a > L) < 2L TP(Y a > L) < 2L e = 5, 

establishing ([H|) , 



Theorem 7.2 Assume the for a £ I, where I is an arbitrary index 
set, that random variables X a satisfy X a > and EJ a < oo. Pick 
any c £ (0, oo) ; and for each a let Y a = (c + X a )* . Then 

{X a : a £ 1} is UI iff \Y a : a £ 1} is tight. 

Proof. By Theorem 17.11 the family {c + X a } is UI iff the family 
{(c + X a )*} is tight. As it is easy to verify that the family {X a } is 
tight [respectively UI] iff the family {c+X a } is tight [respectively UI], 
Theorem 17.21 follows directly from Theorem 17.11 

8 Size biasing and infinite divisibility: 
the heuristic 

Because of the recipe (fT2|) . it is natural that our question in ([5]) is 
related to the concept of infinite divisibility. We say that a random 
variable X is infinitely divisible if for all n, X can be decomposed in 
distribution as the sum of n iid variables. That is, that for all n there 
exists a distribution dF n such that if X^ n> , . . . , Xn are hd with this 
distribution, then 

X±x{ n) + --- + X^\ (15) 

12 



Because this is an iid sum, by (|12j) . we have 

x* = {x -x[ n) ) + {x[ n) y, 

with X — x[ n ' and (x[ n ')* independent. For large n, X — x[ n ' will 
be close to X, and so we have represented the size bias distribution of 
X as approximately equal, in distribution, to X plus an independent 
increment. Hence it is natural to suspect that the class of non negative 
infinitely divisible random variables can be size biased by adding an 
independent increment. 

It is not difficult to make the above argument rigorous, for infinitely 

(n) 

divisible X > with TEX < oo. First, to show that X— X\ converges 

(n) 

in distribution to X, it suffices to show that X^ converges to zero 
in probability. Note that X > implies X{ > 0, since (|15p gives 
= P(X < 0) > (P(x{ n) < 0) n . Then, given e > 0, oo > EI > 
nP(x{ n) > e)e implies that P(xj n) > e) -> 0; hence x[ n) -> in 
probability as n — > oo. 
We have that 

X* = (X - X[ n) ) + (X[ n) )\ (16) 

with X — X{ and (Xj™ )* independent, and X — X{ n converging 
to X in distribution. Now, the family of random variables (X[ n )* is 
"tight", because given e > 0, there is a K such that W(X* > K) < e, 
and by CE]), for all n, W({x[ n) )* > K) < P(X* > K) < e. Thus, 
by Helly's theorem, there exists a subsequence n^ of the n's along 

which (X± ')* converges in distribution, say (Xj nk )* — > Y . Taking 
n — > oo along this subsequence, the pair (X — X™, (X™)*) converges 
jointly to the pair (X, Y) with X and Y independent. From X* = 

(X - X[ nk) ) + (x[ nk) )* d ^ T X + Y as k -> oo we conclude that 
X* = X + y, with y > 0, and X, Y independent. This concludes a 
proof that if X > with < EX < oo is infinitely divisible, then it 
satisfies ([5]). 



9 Size biasing and Compound Poisson 

Let us now return to our main theme, determining for which distri- 
butions we have ([5]). We have already seen it is true, trivially, for a 
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scale multiple of a Poisson random variable. We combine this with 
the observation (|lip that to size bias a sum of independent random 
variables, just bias a single summand, chosen proportional to its ex- 
pectation. Consider a random variable of the form 

n 

X = YJ Xj, with Xj = VjZj, Zj ~ Po(Xj), Z\, . . . , Z n independent, 

(17) 
with distinct constants yj > 0. 

Since X is a sum of independent variables, we can size bias X by 
the recipe (fTTjk pick a summand proportional to its expectation and 
size bias that one. We have EXj = yjXj and therefore a = EX = 
Y^j Vj^j- Hence, the probability that we pick summand j to size bias 
is 

JP(I = j) = yjXj/a. 

But by , X* = Xj + yj , so that when we pick Xj to bias we add 
yj. Hence, to bias X we merely add yj with probability yjXj/a, or, to 
put it another way X* = X + Y, with X, Y independent and 

JP(Y = Vj ) = yjXj/a. (18) 

In summary, X of the form ()17p can be size biased by adding an 
independent, nonnegative increment. It will turn out that we have 
now nearly found all solutions of ([5]), which will be obtained by taking 
limits of variables type (|17j) and adding a nonnegative constant. 

Sums of the form ()17p are said to have a compound Poisson distri- 
bution — of finite type. Compound Poisson variables in general are 
obtained by a method which at first glance looks unrelated to (fTT|) . 
considering the sum Sn formed by adding a Poisson number ./V of iid 
summands Ai,A2,... from any distribution, i.e. taking 

X = 57v:=vli + --- + vliv, N~Po(X), (19) 

where N, Ai, A%, . . . are independent and A\,A2,... identically dis- 
tributed. The notation Sn := A\ + . . . + A^ reflects that of a random 
walk, S n := A\ + • • ■ A n for n = 0, 1, 2, ... , with So = 0. 

To fit the sum (pH) into the form CE]), let A = ^Xj and let 
A,A\,Ai,... be iid with 

W{A = Vj ) = Xj/X. (20) 
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The claim is that X as specified by (|17p has the same distribution 
as the sum Sjv- Checking this claim is most easily done with gen- 
erating functions, such as characteristic functions, discussed in the 
next section. Nevertheless, it is an enjoyable exercise for the special 
case where a±, . . . ,a n are mutually irrational, i.e. linearly independent 
over the rationals, so that with k±, . . . , k n integers, the sum ^? kja,j 
determines the values of the kj . 

Note that the distribution (|2(jp of the summands Ai is different 
from the distribution in (|18p . In fact, A* = Y, which can be checked 
using (|20|) together with ([3|), and comparing the result to (fT8|) : 

(21) 
Thus the result that for the compound Poisson of finite type, X* = 
X + Y can be expressed as 

(S N )* = S N + Y = A 1 + --- + A N + A*, (22) 

with all summands independent. Note how this contrasts with the 
recipe (|12p for size biasing a sum with a deterministic number of terms 
n; in the case of (1121) the biased sum and the original sum have the 
same number of terms, but in (j22j) the biased sum has one more term 
than the original sum. 

If we want to size bias a general compound Poisson random variable 
Sn, there must be some restrictions on the distribution for the hd 
summands Aj in (I19p . First, since A > 0, and (using the independence 
of N and S ,Si,...,) TES N = TEN TEA = \TEA, the condition that 
TESn £ (0,oo) is equivalent to TEA £ (0,oo). The condition that 
Sn > is equivalent to the condition Ai > 0. We choose the additional 
requirement that A be strictly positive, which is convenient since it 
enables the simple computation P(5at = 0) = P(A^ = 0) = e~ A . 
There is no loss of generality in this added restriction, for if p := 
TP(A = 0) > 0, then with M being Poisson with parameter TEM = 
A(l — p), and B%,B2, . . . iid and independent of M, with TP(B £ I) = 
TP(A G -0/(1 — p) for / C (0,oo) (using p < 1 since IEvl > 0), we have 

A\ -\ + An = B\ H + Bm, so that Sn can be represented as a 

compound Poisson with strictly positive summands. 
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10 Compound Poisson vs. Infinitely 
Divisible 

We now have two ways to produce solutions of (|5]), infinitely divisible 
distributions from section [51 and finite type compound Poisson distri- 
butions, from (|17p . so naturally the next question is: how are these 
related? 

The finite type compound Poisson random variable in (|17p can be 
extended by adding in a nonnegative constant c, to get X of the form 

ii 
X = c+2_, Xj> with Xj = yjZj, Zj ~ Po(Xj), Zi,...,Z n independent, 

(23) 
with c > and distinct constants yj > 0. In case c > 0, this random 
variable is not called compound Poisson. Every random variable of 
the form (]23[) is infinitely divisible - for the X!> [for i = 1 to m in 
(|15p ] simply take a sum of the same form, but with c replaced by c/m 
and Xj replaced by Xj/m. 

The following two facts are not meant to be obvious. By tak- 
ing distributional limits of sums of the form (j23|) , and requiring that 
c+Xa Vj^j stays bounded as n — > oo, one gets all the non-negative in- 
finitely divisible distributions with finite mean. By also requiring that 
c = and Yli -\? stays bounded, the limits are all the finite mean, 
non-negative compound Poisson distributions. 

To proceed, we calculate the characteristic function for the distri- 
bution in (HZD. First, if X is Po(A), then 

\k (\piu\k 

(j>x{u) := W,e iuX = V e iuk W{X = jfe) = V e iuk e~ x ^- = e~ A V l ' = exp(A(e iu -l)). 

k>0 

For a scalar multiple yX of a Poisson random variable, 

(f> yX (u) = Ee m ^ = Ee^"^ = cj) X (yu) = exp(X(e iuy - 1)). 

Thus the summand Xj = yjZj in (|17p has characteristic function 
exp(Aj(e mj/j — 1)), and hence the sum X has characteristic function 

n I n 

<f>x(v) = JJexp {Xj(e lu ^ - 1)) = exp £>^ e " % - 
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To prepare for taking limits, we write this as 

<f>x(v) = exp (f^ A,(e^ - 1) J = exp ( J (e*"" - 1) //(dy) j , 

(24) 
where yu is the measure on (0, oo) which places mass \j at location yj. 
The total mass of \x is J 1 //(dy) = Y2i ^j, which we will denote by A, 
and the first moment of // is J y //(dy) = Y^i Vj^ji which happens to 
equal a := EX. 

Allowing the addition of a constant c, the random variable of the 
form (|23p has characteristic function <px whose logarithm has the form 
log 4>x(u) = iuc + J/ 0O -)(e m2/ — 1) //(dy), where // is a measure whose 
support consists of a finite number of points. The finite mean, not 
identically zero distributional limits of such random variables yield all 
of the finite mean, nonnegative, not identically zero, infinitely divisi- 
ble distributions. A random variable X with such a distribution has 
characteristic function (f>x with 

log <f)x (u) = iuc+ [ (e iu y - 1) fi(dy) , (25) 

J(0,oo) 

where c > 0, [i is any nonnegative measure on (0, oo) such that 
J y fj,(dy) < oo, and not both c and \x are zero. 

Which of the distributions above are compound Poisson? The 
compound Poisson variables are the ones in which c = and A := 
)u((0, oo)) < oo. With this choice of A for the parameter of A^ , we have 
X = Sx as in (fl~9j) . with the distribution of A given by ///A, i.e. P(^4 6 
dy) = n{dy)/\. To check, note that for X = S N , </>x(u) := JEe iuSN = 
£„>o P(A = n)e iuS - = e- x £ n >o X n /nl(M^) n = exp(A(^(«) - 1) 
= exp(A(/ {0 oo) (e iMy - 1) JP(A € dy), which is exactly (J25) with c = 0. 

In our context, it is easy to tell whether or not a given infinitely 
divisible random variable is also compound Poisson — it is if and only 
if W(X = 0) is strictly positive — corresponding to JP(N = 0) = e~ A 
and e~°° = 0. Among the examples of infinitely divisible random 
variables in section [16.21 the only compound Poisson examples are the 
Geometric and Negative binomial family, and the distributions related 
to Buchstab's function. 

In (|18p - - specifying the distribution of the increment Y when 
X* = X + Y with X, Y independent - - the factor yj on the right 
hand side suggests another way to write ([25]) . We multiply and divide 
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by y to get 

r p^v _ i 
\og(j> x iu)= v(dy), (26) 

i[o,oo) y 

where v is any nonnegative measure on [0, oo) with total mass v{ [0, oo) ) 6 
(0,oo). The measure v on [0,oo) is related to c and /j, by ^({0}) = c 
and for y > 0, v{dy) = y fi(dy); we follow the natural convention 
that (e luy — l)/y for y = is interpreted as iu. The measure v/a is a 
probability measure on [0, oo) because 



a :=EI = -i(dlog4>x(u)/du)\ u=0 = c+ / y fi(dy) = c+i/((0,oo)) = z/([0,oo)). 

J(0,oo) 

(27) 
We believe it is proper to refer to either (|25p or (|26p as a Levy rep- 
resentation, and to refer to either /i or v as the Levy measure, in 
honor of Paul Levy; indeed when restriction that X be nonnegative 
is dropped, there are still more forms for the Levy representation, see 
e.g. [9] Feller volume II, chapter XVII. 

It is not hard to see that any distribution specified by (|26p satisfies 
(|5|), by the following calculation with characteristic functions. Note 
that for g{x) := e lux , the characterization ([2]) of the distribution of X* 
directly gives the characteristic function (ft* of X* as 4>*{u) := W^e tuX 
= IE(Xe* )/a. For any X > with finite mean, by an application of 
the dominated convergence theorem, if 4>(u) := IEe" 1 ^ then (j)'(u) = 
E(iXe iuX ). Thus for any X > with < a = EX < oo, 

f(«) = -M (28) 

la 

Now if X has characteristic function <f> given by (|26p . again using 
dominated convergence, 



cj)'{u) = (j){u) \ic+ / iye tuy n(dy) \ = ia <p(u) e luy v{dy)/a. 

\ ~'(0,oo) / J[0,oo) 

, (29) 
Taking the probability measure v/a as the distribution of Y, and 
writing r\ for the characteristic function of Y, (|29p says that </>'(n) = 
ia <^>(w) ^(u). Combined with (|28p . we have 

4>* = (f)rj. (30) 

Thus X* = X + y, with X and Y independent and C(Y) = v/a. 
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For the compound Poisson case in general, in which the distribu- 
tion of A is n/\, we have Y = A* because a := EX = A TEA and 

W(A* G dy) = yW{A G dy)/MA = ^^ (d ^ /A) = u(dy)/a = P(Y G dy), 

a/A 

(31) 

which can be compared discrete version (|20p . Thus the computation 

(j30l) shows, for the compound Poisson case, that (|22j) holds. 



11 Main Result 

Our main result is essentially the converse of the computation ([30 



Theorem 11.1 (Steutel 1973 JT?]/ ) For a random variable X > 
with a := EX G (0,oo), £/ie following three statements are equivalent. 

i) There exists a coupling with X* = X -\- Y, Y > 0, and X, Y 
independent. 

ii) The distribution of X is infinitely divisible, 

Hi) The characteristic function of X has the Levy representation 

Furthermore, when any of these statements hold, the Levy measure 
v in &26\) equals a times the distribution ofY. 



Proof. We have proved that ii) implies i), in section [8J We have 
proved that hi) implies i), in the argument ending with (I30p . which 
also shows that given v in the Levy representation (j26j) . the increment 
Y in i) has distribution v/a. 

The equivalence of ii) and iii) is a standard textbook topic — with 
the argument for iii) implies ii) being simply that X with a given v is 
the sum of n iid terms each having the Levy representation (|26p with 
u/n playing the role of v. 

Now to prove that i) implies ii), we assume that i) holds. The 
characteristic function (jf of X* has the form (jf = 4"n, where (ft and r\ 
are the characteristic functions of X and Y, so that rj(u) = Ee m = 
Jlooo) e%UV IP(^ £ dy). Combining this with ([28]) we have 

— 4>'{u) = 4>*(u) = (f>(u) r](u) 
ia 
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so that (log (j>(u))' = ia rj(u). Since log 



log0(«) ia I r](s) ds 

ls£[0,u) 



ia J I e isy W(Y G dy) ds 

Js£[0,u) Jj/e[0,oo) 

ia / I e isy dsTP(Y £ dy) 

ial [ I e isy ds P(Y G dy) + uIP(y = 0) 

P(T G dy). 




ye(o,oo) Jse[o,u) 



»/e[o,oo) y 



This is the same as the representation (j26j) . with ^ = aC(Y) for the 
random variable Y given in i). 

■ 
Observe that v is an arbitrary probability distribution on [0, oo), 
i.e. v GPr([0, oo)), and the choice of a G (0, oo) is also arbitrary. Thus 
there is a one-to-one correspondence between the Cartesian product 
Pr([0, oo)) x (0, oo) and the set of the nonnegative, infinitely divisible 
distributions with finite, strictly positive mean. 

12 A consequence of X* = X + Y with 
independence 

To paraphrase the result of Theorem 111.14 f° r a nonnegative random 
variable X with a := El G (0, oo), it is possible to find a coupling 
with X* = X + Y, Y > and X, Y independent if and only if the 
distribution of X is the infinitely divisible distribution with Levy rep- 
resentation (|26p governed by the finite measure v equal to a times 
the distribution of Y . Thus we know an explicit, albeit complicated, 
relation between the distributions of X and Y . It is worth seeing how 
(|5l) directly gives a simple relation between the densities of X and Y, 
if these densities exist. 

In the discrete case, if X has a mass function fx and if (|26p holds, 
then Y must mass function, fy, and by ([5]), fx* is the convolution of 
fx and /y: fx*(x) = Y. y fx(x - y)fy(y)- Combined with ©, this 
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says that for all x > 0, 

fx(x) = - S P j fx{x-y)f Y {y). 



Likewise, in the continuous case, if X has density fx (i.e. if for all 
bounded g, JEg(X) = J g(x)fx(x) dx,) and if (f26|) holds, and if further 
Y has a density fy, then by ([5]), fx* is the convolution of fx and fy. 
fx* (x) = J fx(x — y)fy(y)- Combined with ([3]), this says that for all 
x > 0, 

fx(x) = - i fx{x-y)f Y {y)dy. (32) 

X Jy 

13 Historical remark 

For all intents and purposes, Theorem II 1.11 is due to Steutel [IT]. The 
way he states his result is sufficiently different from our Theorem 111.11 
that for comparison, we quote verbatim from |17j . p. 136: 
Theorem 5.3. A d.f. F on [0, oo) is infinitely divisible iff it satisfies 

rx rx 

(5.6) / udF(u) = / F(x-u) dK(u), 
Jo Jo 

where K is non-decreasing. 

Observe that Steutel's result is actually more general than Theo- 
rem Ill.H since that latter only deals with nonnegative infinitely di- 
visible random variables with finite mean. The explicit connection 
between the independent increment for size biasing, and the Levy rep- 
resentation, is made in |11| . along with further connections between 
renewal theory and independent increments. 



14 The product rule for size biasing 

We have seen that for independent, nonnegative random variables 
X\, . . . , X n , the sum X = X\ + X2 • • • + X n can be size biased by pick- 
ing a single summand at random with probability proportional to its 
expectation, and replacing it with one from its size biased distribution. 
Is there a comparable procedure for the product W = X\X2 ■ ■ ■ X n l 
Would it involve size-biasing a single factor? 
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Let eij = IEAj € (0,oo), let Fi be the distribution function of 
X{, and let F* be the distribution function of X* , so that dF*(x) = 
x dFi{x)/ai. Let X{, . . . ,X* be independent. By © with a := EW = 
aia2 • • • a n , for all bounded functions g, 

Eg(W*) - E(Wg(W))/(a 1 a 2 ---a n ) 

xi'-Xn g(xiX2---x n ) dFx(xi) ■ ■ ■ dF n (x n )/(ai 



g(x 1 x 2 ---x n ) (xi dF^xi) / ai) ■ ■ ■ (x n dF n (x n )/a r 

g(x ± x 2 ■■■x n ) dF%(xx) ■ ■ ■ dF*(x n ) 
Eg(Xl...X* n ), 



and so 



W* =X{---X* n . 



We have shown that to size bias a product of independent variables, 
one must size bias every factor making up the product, very much 
unlike what happens for a sum, where only one term is size biased! 

15 Size biasing the lognormal distri- 
bution 

The lognormal distribution is often used in financial mathematics to 
model prices, salaries, or values. A variable L with the lognormal 
distribution is obtained by exponentiating a normal variable. We fol- 
low the convention that Z denotes a standard normal, with EZ = 0, 
var Z = 1, so that L = e z represents a standard lognormal. With 
constants a > 0, \x G IR, aZ + \i represents the general normal, and 
L = e aZ+ ^ represents the general lognormal. As the lognormal is 
non-negative and has finite mean, it can be size biased to form L* . 

One way to guess the identity of L* is to use the method of mo- 
ments. For the standard case L = e z , for any real t, calculation gives 
IEe = exp(t 2 /2). Taking t = 1 shows that EL = y/e, and more 
generally, for n = 1, 2, . . ., EL n = ~\Ee nZ = exp(n 2 /2). Using relation 
([!]), the moment-shift for size biasing, we have JE(L*) n = IEL n+1 /IEL 
= exp((n + l) 2 /2-l/2) = exp(n 2 /2 + n) = e n JEL n = JE(eL) n . Clearly 
we should guess that L* = eL, but we must be cautions, as the most 
famous example of a distribution which has moments of all orders 
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but which is not determined by them is the lognormal; for other such 
distributions related to the normal, see [16] , 

We present a rigorous method for finding the distribution of L* , 
based on the size biasing product rule of the previous section; as an 
exercise the reader might try to verify our conclusion (|33p by working 
out the densities for lognormal distributions, and using the relation 

©• 

We begin with the case \x = 0, a > 0. Let Q be independent 

variables taking the values 1 or —1 with equal probability. These 
variables have mean zero and variance one, and by the central limit 
theorem, we know that 



1 



distr 



aZ. 



Hence, we must have 

71 -. -. 71 

W = TTexp(^o-Cj) = exp(—=y^ aQ) — > exp(crZ) = L, 
x -"r \/n v n , 

7 = 1 7 = 1 

ciistr 
a lognormal, and thus W* — > L*. Write X{ := exp(aC{ / y/n) , so 

that W = Xx ■ ■ • X n with independent factors, and by the product 

rule, W* = X* ■ ■ ■ X*. The variables X { take on the values q = e -0 "/^ 

and p = e u 'v^ with equal probability, and so X* take on these same 

values, but with probabilities q/(p + q) and p/(p + q) respectively. 

Let's say that B n of the X* take the value p, so that n — B n of the 

X* take the value q. Using B n , we can write 

jy* _ „-Bn„n— B n _ e cr(2_B„-n)/ v / n 

Since B n counts the number of "successes" in n independent trials, 
with success probability p/(p+q), B n is distributed binomial(n,p/(p+ 
q)). As n — > oo, the central limit theorem gives that B n has an ap- 
proximate normal distribution. Doing a second order Taylor expan- 
sion of e x around zero, and applying it at x = ±a j 'y/n, we find that 
p/ip + <l) = 1/2 + v/CZx/n) + 0(l/n), so that B n is approximately 
normal, with mean np/{p + q) = (1/2) (n + 0\/n) + 0(1) and variance 
npq/(p + q) 2 = n/4 + 0(l/n 3 / 2 ). Hence 

J- /0 n v distr 

2i3„ — n) — > Z + a as n — > oo 
n " 
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and therefore 

w * djste ea{z+(7 ) _ 

Since W* d -^> L* = (e aZ )*, we have shown that (e aZ )* = e a( - z+a l 
For the case where L = e +M , the scaling relation §8§ yields the 
formula for size biasing the lognormal in general: 

(ffZ+ny = ^(z+vHti^ (33) 



16 Examples 



In light of Theorem 111,11 for a nonnegative random variable X with 
finite, strictly positive mean, being able to satisfy X* = X + Y with 
independence and Y > is equivalent to being infinitely divisible. We 
give examples of size biasing, first with examples that are not infinitely 
divisible, then with examples that are. 

16.1 Examples of size biasing without an in- 
dependent increment 

Both examples 1 and 2 below involve bounded, nonnegative random 
variables. Observe that in general, the distributions of X and X* 
have the same support, except that always P(X* = 0) = 0. This 
immediately implies that if X is bounded but not constant, then it 
cannot satisfy ([!]). 

Example 1. Bernoulli and binomial 

Let Bi be Bernoulli with parameter p G (0,1], i.e. Bi takes the 
value 1 with probability p, and the value with probability 1 — p. 
Clearly B* = 1, since P(SJ = 1) = 1P(Si = l)/ESi = 1. If 
B\, B2, ■ ■ ■ are independent, and S n = B\ + ■ ■ ■ + B n we say that S n ~ 
binomial (n,p). We size bias S n by size biasing a single summand, 
so S 1 * = S n -i + 1, which cannot be expressed as S n + Y with S n ,Y 
independent! 

Note that letting n — > 00 and np — > A in the relation S* = 5 n _i + 1 
gives another proof that X* = X + 1 when X ~ Po(X), because both 

S n -i — > X and S n — > X. Here we have a family of examples 
without independence, whose limit is the basic example with indepen- 
dence. 
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Example 2. Uniform and Beta 

The Beta distribution on (0,1), with parameters a, b > is specified 
by saying that its has a density on (0,1), proportional to (1— x) a ~ 1 x 1 . 
The uniform distribution on (0,1) is the special case a = b = 1 of this 
Beta family. Using fl3J), if X ~ Beta(a, b), then X* ~ Beta(a, b + 1). 

There are many families of distributions for which size biasing 
simply changes the parameters; our examples are the Beta family in 
example 2, the negative binomial family in example 4, the Gamma 
family in example 5, and the lognormal family in example 6. In these 
families, either all members satisfy ([5]), or else none do. Thus it might 
be tempting to guess that infinite divisibility is a property preserved 
by size biasing, but it ain't so. 
Example 3. X = 1 + W where W is Poisson 

We have X* is a mixture of X + and X + 1, using (jlip with 
Xx = 1, X\ = X 1 + and X 2 = W, X% = W + 1. That is, X* is 
a mixture of 1 + W, with weight 1/(1 + A), and 2 + W, with weight 
A/(l + A). Elementary calculation shows that it is not possible to have 
X* = X + Y with X, Y independent and Y > 0. Therefore X is not 
infinitely divisible. 

Since X = W* , we have an example in which W is infinitely divis- 
ible, but W* is not. 

16.2 Examples of X* = X + Y with indepen- 
dence 

By Theorem 111.11 when X satisfies X* = X + Y with X, Y indepen- 
dent and Y > 0, the distribution of X is determined by the distribu- 
tion of Y together with a choice for the constant a £ (0, oo) to serve as 
EA. Thus all our examples below, organized by a choice of Y, come 
in one parameter families indexed by o — or if more convenient, by 
something proportional to a; in these families, X varies and Y stays 
constant! 

Example 4. Y is 1 -(-geometric. X is geometric or negative 
binomial 

4a) The natural starting point is that you are given the geomet- 
ric distribution: W(X = j) = (1 — q)q J for j > 0, with < q < 
1, and you want to discover whether or not it is infinitely divisi- 
ble. Calculating the characteristic function, <fi(u) = ^fc>o emfc (^ — 
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q)q k = (1 — q)/(l — qe lu ), so log <j)(u) = log(l — q) — log(l — qe m ) 

= ~ Ei>i 9 J /J + E;>i(f e^'/j) = E;>i((e^ - i)/i) ^ 

Thus the geometric distribution has a Levy representation in which 

v has mass q 3 at j = 1,2, . . ., so we have verified that the geometric 

distribution is infinitely divisible. The total mass a of v is a = q + 

g 2 + • • • = q/(l — q); and this agrees with the recipe a = JEX. Since 

P(y = j) = u({j})/a = (1 - q)q j " 1 for j = 1,2,.. ., we have Y = 

l + X. Thus X* = X + Y with X, F independent and Y = X + 1. 

4b) Multiplying the Levy measure f by i > yields the general case 

of the negative binomial distribution, X ~ negative binomial(t,g). 

The case t = 1 is the geometric distribution. We still have X* = X+Y 

with X,Y independent, and Y ~ geometric(g) + 1. Note that for 

integer t we can verify our calculation in another way, as in this case 

X is the sum of t independent geometric(g) variables Xj. By (TTZj) . 

we can size bias X by size biasing a single geometric term, which is 

the same as adding an independent Y with distribution, again, 1 + 

geometric(g). 

Example 5. Y is exponential. X is exponential or Gamma 

5a) Let X be exponentially distributed with IEX = 1/a, i.e. 
1P(X > t) = e~ at for t > 0. As we saw in section [5] for the case 
a = 1, X* = X + Y with X, Y independent and Y = X. The case 
with general a > is simply a spatial scaling of the mean one, "stan- 
dard" case. The Levy measures v is simply WjX = 1/a times the 
common distribution of X and Y, with v{dy) = e~ ay dy 

5b) Multiplying the Levy measure v by t > yields the general case 
of the Gamma distribution, X ~ Gamma(a, t). The name comes from 
that fact that X has density f(x) = (a f /T(i)) x t ~ x e-° a on (0, oo). The 
special case t = 1 is the exponential distribution, and more generally 
the case t = n can be realized as X = X% + • • • + X n where the Xi are 
iid, exponentially distributed with EXj = 1/a. We have X* = X + Y 
with X, Y independent and Y is exponentially distributed with mean 
(1/a), so that X* ~ Gamma(a, t+1). The Levy measure here is v with 
v{dy) = te~ ay dy; so the corresponding \x has (J,(dy) = te~ ay /y dy. 
This form of \x is known as the Moran or Gamma subordinator; see 
e.g. [13J. As in example 4b), for integer t we can verify our calculation 
by noting that X is the sum of t independent exponential, mean (1/a) 
variables, and that by fll2f) . when size biasing we will get the same Y 
added on to the sum as the Y which appears when size biasing any 
summand. 
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Example 6. Y is ??, X is lognormal 

As mentioned in Section (|15p. we say that X is lognormal when 
X = e aZ+ ^ where Z is a standard normal variable. The proof that 
the lognormal is infinitely divisible, first given by Thorin [19], remains 
difficult; there is an excellent book by Bondesson [7] for further study. 
Consider even the standard case, X = e z ', so that by equation (fT5j) , 
X* = e +1 = eX. The result of Thorin that this X is infinitely di- 
visible is thus equivalent, by Theorem lll.l| to the claim that there 
exists a distribution for Y > such that with X and Y independent, 
X + Y = eX. Also, by Theorem QXT] with a = TEX = y/e, the distri- 
bution of Y is exactly l/\/e times the Levy measure v. However, there 
does not seem to exist yet a simplified expression for this distribution! 

Since the lognormal X = e z satisfies X* = eX = X + (1 — e)X, it 
provides a simple illustration of our remarks in the paragraph following 
, that the relation X* = X + Y, Y > does not determine the 
distribution of Y without the further stipulation that X and Y be 
independent. Note also that in X* = X + (1 — e)X, the increment 
(1 — e)X is a monotone function of X, so this is an example of the 
coupling using the quantile transformation. 

Example 7. Y is uniform on an interval (/5, 7), with < j3 < 

7 < 00. By scaling space (dividing by 7) we can assume without loss 
of generality that 7 = 1. This still allows two qualitatively distinct 
cases, depending on whether /3 = or j3 > 0. 

Example 7a. /? = 0: Dickman's function and its convolution 
powers. 

With a = TEX £ (0, 00), this example is specified by (|26p with v 
being a times the uniform distribution on (0,1), so that fj,(dx) = a/x dx 
on (0,1). The reader must take on faith that v having a density, 
together with /x( (0, 00) ) = 00 so that W(X = 0) = 0, implies that the 
distribution of X has a density, call it g a . Size biasing then gives an 
interesting differential-difference equation for this density: using (132p . 
for x > 0, 

a f 1 a f x 

9a{x) = - I g a {x -y) dy = - I g a (z) dz. (34) 

x Jy=0 x Jx-1 

Multiplying out gives xg a (x) = a j^_ 1 g a (z) dz, and taking the deriva- 
tive with respect to x yields xg' a (x) + g a {x) = ag a {x) — ag a (x — 1), so 
that g' a (x) = ( (a - l)g a (x) - ag a (x - 1) )/x, for x > 0. 
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For the case a = 1 this simplifies to g[(x) = —gi(x — l)/x, which is 
the same differential-difference equation that is used to specify Dick- 
man's function p, of central importance in number theory; see |18| . 
The function p is characterized by p(x) = 1 for < x < 1 and p'{x) = 
—p(x — l)/x for x > 0, and p{x) = for x < 0, with p continuous 
on [0, oo), and from the calculation that J^° p(x) dx = e 7 , where 7 is 
Euler's constant, it follows that g\ (x) = e~~ / p(x). Dickman's function 
governs the distribution of the largest prime factor of a random integer 
in the following sense: for fixed u > 0, the proportion of integers from 
1 ton whose largest prime factor is smaller than n l ' u tends to p(u) 
as n —7- 00. For example, p(2) can be calculated from the differential 
equation simply by p(2) = p{\) + L p'(x) dx = 1 + /-, — p(x — l)/x dx 
= 1 + J-f -1/x dx = 1 - log2 = 1 - .69314 = .30686, and the claim 
is that p(2) gives, for large n, the approximate proportion of integers 
from 1 ton all of whose prime factors are at most \fn. 

For general t > the density g t is a "convolution power of Dick- 
man's function," see [12] . The size bias treatment of this first appeared 
in the 1996 version of [2], and was subsequently written up in [I]. 

Example 7b. (3 > 0: Buchstab's function, integers free of 
small prime factors. 

For these examples Y is uniform on (/?, 1) for j3 6 (0,1), with 
density 1/(1 — (5) on (/3,1). Therefore v is a multiple of uniform 
distribution on (/?, 1), with density t on (/?, 1) for some constant t > 

— we have a := EX = t(l — f3) — but t rather than a is the convenient 
parameter. From u{dx) = t dx on (/?, 1) we get p(dx) = t/x dx on 
(/3, 1), so that the total mass of p is A = f,„ ^ t/x dx = £log(l//3). 

Since A < 00, X is compound Poisson with P(X = 0) = e _A = /?*. 

For the case t = 1, the distribution of the random variable X is 
related to another important function in number theory, Buchstab's 
function oj; again see [18]. The relation involves a "defective density" 

- here t = 1 and TP(X = 0) = j3 > so X does not have a proper 
density. Size biasing yields a relation similar to (132H . which leads to a 
differential-difference equation, which in turn establishes the relation 
between the defective density and Buchstab's function; see [3]. The 
net result is that for j3 < a < b < 1, P(a < X < b) = f u>(x/(3) dx. 
Buchstab's function uj is characterized by the properties that it is 
continuous on (l,oo), uj(u) = 1/u for u £ [1,2], and (uuj(u))' = oj(u — 
1) for u > 2. It governs the distribution of the smallest prime factor 
of a random integer in the sense that for u > 1, the proportion of 
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integers form 1 to n whose smallest prime factor is at least n 1 ' u is 
asymptotic to uuj{u)/\ogn. 
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